In [1]:
%%HTML
<style>
.container { width:100% }
</style>
The global variable Cache
is used as a cache for the function value
defined later.
In [2]:
Cache = {}
In order to have some variation in our game, we use random numbers to choose between optimal moves.
In [3]:
import random
random.seed(2)
Given a player p
, the function other(p)
computes the opponent of p
. This assumes that there are only two players and the set of all players is stored in the global variable Players
.
In [4]:
def other(p):
return [o for o in Players if o != p][0]
The module heapq
implements heaps. We use these as priority queues.
In [5]:
import heapq
The function value
takes six arguments:
State
is the current state of the game,player
is a player,limit
determines the lookahead. To be more precise, it is the number of half-moves that are investigated to compute the value. If 'limit' is 0, the game is evaluated using heuristic
.heuristic
is a function that takes a state and estimates the value of the state.alpha
is the worst result that can happen to player
.beta
is the best result that can happen to player
.The function value
returns the value that the given State
has for player
if both players play optimal game. This values is an element from the set $\{-1, 0, 1\}$.
player
can force a win, the return value is 1
.player
can at best force a draw, the return value is 0
.player
might loose even when playing optimal, the return value is -1
.For reasons of efficiency, this function is memoized.
In [6]:
def value(State, player, limit, heuristic, alpha=-1, beta=1):
global Cache
if (State, limit) in Cache:
val, a, b = Cache[(State, limit)]
if a <= alpha and beta <= b:
return val
else:
alp = min(alpha, a)
bet = max(beta , b)
val = alphaBeta(State, player, limit, heuristic, alp, bet)
Cache[(State, limit)] = val, alp, bet
return val
else:
val = alphaBeta(State, player, limit, heuristic, alpha, beta)
Cache[(State, limit)] = val, alpha, beta
return val
The function value_cache
returns the value stored in the Cache
.
In [7]:
def value_cache(State, limit):
val, _, _ = Cache.get((State, limit), (0, -1, 1))
return val
In [8]:
def alphaBeta(State, player, limit, heuristic, alpha=-1, beta=1):
if finished(State):
return utility(State, player)
if limit == 0:
return heuristic(State, player)
val = alpha
NextStates = next_states(State, player)
Moves = [] # empty priority queue
for ns in NextStates:
# no '-' in front of value as smallest value has highest priority
# limit-3 is the value from the previous iteration
heapq.heappush(Moves, (value_cache(ns, limit-3), ns))
while Moves != []:
_, ns = heapq.heappop(Moves)
val = max(val, -value(ns, other(player), limit-1, heuristic, -beta, -alpha))
if val >= beta:
return val
alpha = max(val, alpha)
return val
The function best_move
takes two arguments:
State
is the current state of the game,player
is a player.The function best_move
returns a pair of the form $(v, s)$ where $s$ is a state and $v$ is the value of this state. The state $s$ is a state that is reached from State
if player
makes one of her optimal moves. In order to have some variation in the game, the function randomly chooses any of the optimal moves.
In [9]:
def best_move(State, player, limit, heuristic):
NS = next_states(State, player)
bestVal = value(State, player, limit, heuristic)
BestMoves = [s for s in NS if -value(s, other(player), limit-1, heuristic) == bestVal]
BestState = random.choice(BestMoves)
return bestVal, BestState
The next line is needed because we need the function IPython.display.clear_output
to clear the output in a cell.
In [10]:
import IPython.display
In [11]:
import time
The function play_game
plays on the given canvas
. The game played is specified indirectly by specifying the following:
Start
is a global variable defining the start state of the game.next_states
is a function such that $\texttt{next_states}(s, p)$ computes the set of all possible states that can be reached from state $s$ if player $p$ is next to move.finished
is a function such that $\texttt{finished}(s)$ is true for a state $s$ if the game is over in state $s$.utility
is a function such that $\texttt{utility}(s, p)$ returns either -1
, 0
, or 1
in the terminal state $s$. We have that
In [12]:
def play_game(canvas, limit, heuristic):
global Cache
Cache = {}
State = Start
History = []
while (True):
firstPlayer = Players[0]
start = time.time()
val, State = best_move(State, firstPlayer, limit, heuristic)
stop = time.time()
diff = round(stop - start, 2)
History.append(diff)
draw(State, canvas, f'{round(diff, 2)} seconds, value = {round(val, 2)}.')
if finished(State):
final_msg(State)
break
IPython.display.clear_output(wait=True)
State = get_move(State)
draw(State, canvas, '')
if finished(State):
IPython.display.clear_output(wait=True)
final_msg(State)
break
for i, d in enumerate(History):
print(f'{i}: {d} seconds')
In [ ]:
%run Tic-Tac-Toe.ipynb
With memoization, computing the value of the start state takes 95 ms. Without memoization, it takes 5 seconds.
In [ ]:
%%time
val = value(Start, 'X', 6, heuristic)
We check how many different states are stored in the Cache
. Without alpha-beta pruning, we had to inspect 5478 different states.
In [ ]:
len(Cache)
Let's draw the board.
In [ ]:
canvas = create_canvas(Start)
draw(Start, canvas, f'Current value of game for "X": {val}')
Now its time to play. In the input window that will pop up later, enter your move in the format "row,col" with no space between row and column.
In [ ]:
play_game(canvas, 6, heuristic)
In [13]:
%run Connect-Four.ipynb
In [14]:
%%time
val = value(Start, 'X', 10, heuristic)
In [15]:
len(Cache)
Out[15]:
In [16]:
canvas = create_canvas(Start)
draw(Start, canvas, f'Current value of game for "X": {round(val, 2)}')
In [17]:
play_game(canvas, 9, heuristic)
In [ ]:
len(Cache)
In [ ]: